Fitting Individual Countries Time Series Trajectories

Lets do some stuff from Solomon Kurz’s other bookdown, Applied Longitudinal Data Analysis in brms and the tidyverse, to see if we can visualize our target variable a bit better. I’m also using some techniques from Hadley Wickham’s nice youtube video, Managing many models with R.

#read the file, if you dont have it from the previous markdown. 
aei <- read.csv("/Volumes/RachelExternal/Thesis/Data_upload_for_CL/AEI.csv")

With Priors

by_ISO <-
  aei %>%
  filter(!is.na(irrperc)) %>% 
  group_by(ISO) %>%
  nest()

Prior Prediction

Doing a little prior plotting, I’ve messed around a bit and settled on some semi-sensible priors assuming a gaussian distribution for both the parameter and slope. I am assuming that our intercept is normally distributed with around a mean of 2 and a standard variation of 2, our beta coef is centered around 0.01 with a sd of 0.1. This produces irrperc values within an acceptable range (roughly 0-15%). There are some negative values here. Perhaps a prior that is bounded by 0 would be a better fit for this, but experiments with log normal distributions have proved difficult. Also, none of the countries have negative trajectories of irrigation expansion, but some do have a decrease towards the end of the study period.

set.seed(17)
N <- 50
a <- rnorm(N , 2, 2)
b <- rnorm( N , 0., 0.1 )

plot( NULL , xlim=range(aei$yearcount) , ylim=c(-50,50) , xlab="year" , ylab="Irrigation Percentage" )
abline( h=0 , lty=2 )
for ( i in 1:N ) curve( a[i] + b[i]*x ,
from=min(aei$yearcount) , to=max(aei$yearcount) , add=TRUE , col=col.alpha("black",0.2) )

These don’t look too bad. There are some lines that predict negative values but in general they seem to be positive and have a very general upward slope.

First Model

Here were fitting the first model which is just dependent on the year count and the priors we specified above..

\[ \begin{aligned} irrperc_c &\sim N(\mu_c, \sigma_c) \\ \mu_c &= \alpha_c + \beta_c*yearcount \\ \alpha_c &\sim N(2,2) \\ \beta_c &\sim N(0.01, 0.1) \\ \sigma_c &\sim exp(1) \end{aligned} \]

I won’t use the first country, as we have 0 for the irrigation percent. AFG is the second country, and there is some evolution.

AFG_norm_yearcount <-
  brm(data = by_ISO$data[[2]], #AFG
      formula = irrperc ~ yearcount,
      control = list(adapt_delta = 0.99),
      prior = c(prior(normal(2,2), class = Intercept),
                prior(normal(0.01, 0.1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 4000, chains = 4, cores = 4,
      seed = 17,
      file = "/Volumes/RachelExternal/Thesis/Thesis/fits/CL_Fits/AFG_norm_d_yearcount")

Yep looks fine here! Check these posterior and the priors, just to see that brms is putting them in the right place…

print(AFG_norm_yearcount, digits = 4)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: irrperc ~ yearcount 
##    Data: by_ISO$data[[2]] (Number of observations: 8) 
## Samples: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
##          total post-warmup samples = 8000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI   Rhat Bulk_ESS Tail_ESS
## Intercept   3.4249    0.2173   2.9896   3.8606 1.0003     4274     3373
## yearcount   0.0335    0.0073   0.0187   0.0482 1.0009     4227     3587
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI   Rhat Bulk_ESS Tail_ESS
## sigma   0.2838    0.1036   0.1561   0.5457 1.0018     2119     2873
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
prior_summary(AFG_norm_yearcount)
##              prior     class      coef group resp dpar nlpar bound       source
##  normal(0.01, 0.1)         b                                               user
##  normal(0.01, 0.1)         b yearcount                             (vectorized)
##       normal(2, 2) Intercept                                               user
##     exponential(1)     sigma                                               user

Yep, all good.

Now for the first part of the master plan, apply this to all countries, resulting in individual country trajectories, individual slopes and intercepts, but calculated only with the 8 data points for each country. Use map here to apply this to every ISO.

models <- 
  by_ISO %>%
  mutate(model = map(data, ~update(AFG_norm_yearcount, newdata = ., seed = 2)))

This runs, and for some of the models it yells that things didn’t converge.. and I’m just going to leave it, as this is not really the most importatnt part.

Calculation of Mean Structure

Again, using the code suggested form S. Kurz, the intercept/intercept standard deviation and the rate of change/rate of change sd can be extracted from the estimates and coeffs.

mean_structure <-
  models %>% 
  mutate(coefs = map(model, ~ posterior_summary(.)[1:2, 1:2] %>% 
                       data.frame() %>% 
                       rownames_to_column("coefficients"))) %>% 
  unnest(coefs) %>% 
  select(-data, -model) %>% 
  unite(temp, Estimate, Est.Error) %>% 
  pivot_wider(names_from = coefficients,
              values_from = temp) %>% 
  separate(b_Intercept, into = c("init_stat_est", "init_stat_sd"), sep = "_") %>% 
  separate(b_yearcount, into = c("rate_change_est", "rate_change_sd"), sep = "_") %>% 
  mutate_if(is.character, ~ as.double(.) %>% round(digits = 2)) %>% 
  ungroup()

Calculation of Residual Variance

residual_variance <-
  models %>% 
  mutate(residual_variance = map_dbl(model, ~ posterior_summary(.)[3, 1])^2) %>% 
  mutate_if(is.double, round, digits = 2) %>% 
  select(ISO, residual_variance)

Calculation of Bayesian \(R^2\)

r2 <-
  models %>% 
  mutate(r2 = map_dbl(model, ~ bayes_R2(., robust = T)[1])) %>% 
  mutate_if(is.double, round, digits = 2) %>% 
  select(ISO, r2)

Fit

table <-
  models %>% 
  unnest(data) %>% 
  group_by(ISO) %>% 
  slice(1) %>% 
  select(ISO) %>% 
  left_join(mean_structure,    by = "ISO") %>% 
  left_join(residual_variance, by = "ISO") %>% 
  left_join(r2,                by = "ISO") %>% 
  rename(residual_var = residual_variance) %>% 
  select(ISO, init_stat_est:r2, everything()) %>% 
  ungroup()

table %>% 
  knitr::kable()
ISO init_stat_est init_stat_sd rate_change_est rate_change_sd residual_var r2
ABW 0.00 0.00 0.00 0.00 0.00 0.50
AFG 3.43 0.22 0.03 0.01 0.08 0.85
AGO 0.06 0.00 0.00 0.00 0.00 0.71
ALB 7.09 2.12 0.09 0.07 12.22 0.27
AND -0.08 0.10 0.01 0.00 0.02 0.70
ARE -0.10 0.60 0.06 0.02 0.66 0.65
ARG 0.39 0.04 0.01 0.00 0.00 0.88
ARM 8.92 0.51 0.02 0.02 0.44 0.36
ASM 0.00 0.00 0.00 0.00 0.00 0.50
ATG 0.22 0.25 0.00 0.01 0.10 0.13
AUS 0.18 0.04 0.01 0.00 0.00 0.91
AUT 0.40 0.12 0.02 0.00 0.02 0.89
AZE 12.45 2.03 0.09 0.04 2.31 0.85
BDI 0.47 0.07 0.01 0.00 0.01 0.71
BEL 0.25 0.14 0.01 0.00 0.03 0.73
BEN 0.02 0.02 0.00 0.00 0.00 0.86
BFA 0.00 0.01 0.00 0.00 0.00 0.93
BGD 3.10 2.99 0.09 0.10 143.96 0.02
BGR 7.51 2.14 -0.07 0.07 12.27 0.10
BHR 0.48 0.86 0.10 0.03 1.35 0.71
BHS 0.01 0.03 0.00 0.00 0.00 0.70
BIH 0.01 0.01 0.00 0.00 0.00 0.83
BLR 0.49 0.17 0.00 0.01 0.05 0.20
BLZ 0.00 0.01 0.00 0.00 0.00 0.94
BMU 0.00 0.00 0.00 0.00 0.00 0.50
BOL 0.07 0.02 0.00 0.00 0.00 0.64
BRA 0.01 0.04 0.01 0.00 0.00 0.95
BRB 1.58 1.65 0.19 0.06 6.94 0.63
BRN 0.02 0.05 0.00 0.00 0.00 0.70
BTN 0.21 0.11 0.02 0.00 0.02 0.87
BWA 0.00 0.00 0.00 0.00 0.00 0.14
CAF 0.00 0.00 0.00 0.00 0.00 0.78
CAN 0.05 0.01 0.00 0.00 0.00 0.90
CHE 0.59 0.07 0.02 0.00 0.01 0.94
CHL 1.30 0.17 0.03 0.01 0.05 0.88
CHN 3.41 0.19 0.07 0.01 0.06 0.97
CIV 0.02 0.02 0.01 0.00 0.00 0.94
CMR 0.01 0.00 0.00 0.00 0.00 0.94
COD 0.00 0.00 0.00 0.00 0.00 0.87
COG 0.00 0.00 0.00 0.00 0.00 0.83
COL 0.11 0.09 0.02 0.00 0.01 0.90
COM -0.01 0.02 0.00 0.00 0.00 0.72
CPV 0.42 0.07 0.01 0.00 0.01 0.77
CRI 0.34 0.14 0.04 0.00 0.03 0.96
CUB 3.23 0.87 0.13 0.03 1.58 0.78
CYM 0.00 0.00 0.00 0.00 0.00 0.50
CYP 2.38 0.36 0.06 0.01 0.24 0.84
CZE 0.50 0.52 0.02 0.02 0.49 0.27
DEU 2.13 1.05 0.01 0.04 2.21 0.07
DJI 0.04 0.00 0.00 0.00 0.00 0.46
DMA 0.00 0.00 0.00 0.00 0.00 0.50
DNK 2.42 1.45 0.19 0.05 5.22 0.69
DOM 2.94 0.15 0.07 0.01 0.03 0.98
DZA 0.07 0.03 0.00 0.00 0.00 0.83
ECU 1.42 0.15 0.05 0.01 0.04 0.96
EGY 2.70 0.02 0.02 0.00 0.00 0.99
ERI 0.01 0.02 0.01 0.00 0.00 0.92
ESP 4.20 0.40 0.06 0.01 0.27 0.85
EST 0.21 0.12 0.00 0.00 0.02 0.13
ETH 0.01 0.04 0.01 0.00 0.00 0.92
FIN 0.02 0.04 0.01 0.00 0.00 0.88
FJI -0.01 0.03 0.00 0.00 0.00 0.78
FRA 0.65 0.28 0.10 0.01 0.14 0.96
FRO 0.00 0.00 0.00 0.00 0.00 0.50
FSM 0.00 0.00 0.00 0.00 0.00 0.50
GAB 0.02 0.00 0.00 0.00 0.00 0.50
GBR 0.32 0.11 0.01 0.00 0.02 0.80
GEO 3.93 0.43 0.07 0.01 0.32 0.84
GHA 0.03 0.04 0.00 0.00 0.00 0.72
GIB 0.00 0.00 0.00 0.00 0.00 0.50
GIN 0.15 0.06 0.01 0.00 0.01 0.78
GLP 0.20 0.73 0.06 0.03 0.97 0.58
GMB 0.07 0.03 0.00 0.00 0.00 0.76
GNB 0.52 0.09 0.01 0.00 0.01 0.63
GNQ 0.00 0.00 0.00 0.00 0.00 0.50
GRC 4.57 0.51 0.15 0.02 0.46 0.95
GRD -0.15 0.18 0.02 0.01 0.05 0.71
GRL 0.00 0.00 0.00 0.00 0.00 0.50
GTM 0.32 0.05 0.02 0.00 0.00 0.98
GUF 0.00 0.01 0.00 0.00 0.00 0.78
GUM -0.15 0.17 0.01 0.01 0.05 0.57
GUY 0.48 0.03 0.01 0.00 0.00 0.93
HND 0.43 0.05 0.01 0.00 0.00 0.83
HRV -0.02 0.08 0.00 0.00 0.01 0.35
HTI 1.56 0.21 0.05 0.01 0.07 0.92
HUN 2.81 1.03 0.01 0.04 2.16 0.06
IDN 1.81 0.39 0.03 0.01 0.27 0.51
IND 7.38 2.11 0.22 0.10 7.77 0.98
IRL 0.00 0.01 0.00 0.00 0.00 0.47
IRN 3.02 0.12 0.04 0.00 0.02 0.97
IRQ 2.25 0.93 0.13 0.03 1.71 0.77
ISL 0.00 0.00 0.00 0.00 0.00 0.50
ISR 7.23 0.86 0.05 0.03 1.33 0.39
ITA 8.89 0.65 0.11 0.02 0.69 0.88
JAM 2.09 0.06 0.01 0.00 0.01 0.74
JOR 0.25 0.10 0.02 0.00 0.02 0.87
JPN 8.36 0.30 -0.01 0.01 0.14 0.37
KAZ 1.12 0.13 0.00 0.00 0.03 0.11
KEN 0.01 0.02 0.00 0.00 0.00 0.92
KGZ 4.84 0.14 0.02 0.00 0.03 0.79
KHM 0.12 0.20 0.04 0.01 0.07 0.90
KIR 0.00 0.00 0.00 0.00 0.00 0.50
KNA 0.05 0.00 0.00 0.00 0.00 0.70
KOR 7.65 0.84 0.04 0.03 1.30 0.31
KWT -0.09 0.08 0.01 0.00 0.01 0.82
LAO -0.12 0.18 0.03 0.01 0.05 0.87
LBN 4.83 0.59 0.12 0.02 0.65 0.90
LBR 0.01 0.01 0.00 0.00 0.00 0.82
LBY 0.06 0.03 0.01 0.00 0.00 0.88
LCA 0.72 0.84 0.10 0.03 1.36 0.70
LIE 0.00 0.00 0.00 0.00 0.00 0.50
LKA 5.81 0.41 0.08 0.01 0.27 0.91
LSO 0.02 0.02 0.00 0.00 0.00 0.76
LTU 0.37 0.23 0.00 0.01 0.10 0.09
LUX 0.00 0.00 0.00 0.00 0.00 0.69
LVA 0.21 0.11 0.00 0.00 0.02 0.14
MAR 1.89 0.10 0.03 0.00 0.02 0.96
MCO 0.00 0.00 0.00 0.00 0.00 0.50
MDA 3.23 0.88 0.15 0.03 1.57 0.84
MDG 0.44 0.17 0.04 0.01 0.05 0.92
MDV 0.00 0.00 0.00 0.00 0.00 0.50
MEX 1.49 0.12 0.05 0.00 0.02 0.98
MHL 0.00 0.00 0.00 0.00 0.00 0.50
MKD 2.12 0.69 0.08 0.02 0.90 0.74
MLI 0.01 0.05 0.00 0.00 0.00 0.55
MLT 1.16 1.24 0.11 0.04 3.23 0.51
MMR 0.63 0.28 0.05 0.01 0.14 0.88
MNG 0.00 0.01 0.00 0.00 0.00 0.91
MOZ 0.09 0.02 0.00 0.00 0.00 0.70
MRT 0.03 0.01 0.00 0.00 0.00 0.58
MTQ 0.76 0.73 0.11 0.03 0.92 0.83
MUS 4.87 0.67 0.12 0.02 0.85 0.87
MWI -0.08 0.09 0.01 0.00 0.01 0.85
MYS 0.72 0.05 0.01 0.00 0.00 0.93
NAM 0.00 0.00 0.00 0.00 0.00 0.90
NCL -0.02 0.08 0.01 0.00 0.01 0.79
NER 0.00 0.01 0.00 0.00 0.00 0.84
NGA 0.19 0.03 0.00 0.00 0.00 0.69
NIC 0.20 0.08 0.01 0.00 0.01 0.83
NLD 6.68 2.76 0.04 0.08 34.94 0.05
NOR 0.04 0.04 0.01 0.00 0.00 0.92
NPL 0.17 0.75 0.19 0.03 0.97 0.93
NRU 0.00 0.00 0.00 0.00 0.00 0.50
NZL 0.95 0.26 0.04 0.01 0.11 0.82
OMN 0.07 0.02 0.00 0.00 0.00 0.91
PAK 6.30 3.20 0.04 0.09 78.16 0.14
PAN 0.19 0.03 0.01 0.00 0.00 0.96
PER 1.18 0.04 0.00 0.00 0.00 0.70
PHL 2.23 0.23 0.09 0.01 0.09 0.97
PLW 0.00 0.00 0.00 0.00 0.00 0.50
PNG 0.00 0.00 0.00 0.00 0.00 0.50
POL 0.85 0.29 0.00 0.01 0.14 0.10
PRI 3.83 0.65 -0.01 0.02 0.76 0.10
PRK 4.19 1.16 0.18 0.04 2.83 0.82
PRT 8.08 1.13 0.00 0.04 2.14 0.07
PRY 0.09 0.02 0.00 0.00 0.00 0.83
PSE 2.85 0.18 0.02 0.01 0.05 0.66
PYF -0.07 0.10 0.01 0.00 0.01 0.69
QAT -0.16 0.17 0.03 0.01 0.05 0.88
REU 1.22 0.61 0.08 0.02 0.65 0.78
ROU 3.81 2.30 0.08 0.08 23.33 0.07
RUS 0.20 0.11 0.00 0.00 0.02 0.12
RWA 0.09 0.05 0.00 0.00 0.01 0.65
SAU 0.10 0.13 0.02 0.00 0.03 0.81
SDN 0.78 0.04 0.01 0.00 0.00 0.86
SEN 0.32 0.07 0.01 0.00 0.01 0.60
SGP 0.00 0.00 0.00 0.00 0.00 0.50
SLB 0.00 0.00 0.00 0.00 0.00 0.50
SLE 0.04 0.06 0.01 0.00 0.01 0.88
SLV 0.81 0.14 0.04 0.00 0.03 0.95
SMK 1.61 0.35 -0.01 0.01 0.22 0.11
SMR 0.00 0.00 0.00 0.00 0.00 0.50
SOM 0.14 0.04 0.00 0.00 0.00 0.80
STP 10.03 0.04 0.00 0.00 0.00 0.46
SUR 0.13 0.03 0.01 0.00 0.00 0.89
SVK 0.22 0.66 0.09 0.02 0.80 0.80
SVN -0.11 0.20 0.02 0.01 0.07 0.61
SWE 0.02 0.02 0.01 0.00 0.00 0.98
SWZ 1.95 0.10 0.02 0.00 0.02 0.93
SYC -0.15 0.20 0.01 0.01 0.07 0.46
SYR 1.62 0.77 0.11 0.03 1.12 0.79
TCA 0.00 0.00 0.00 0.00 0.00 0.50
TCD 0.00 0.00 0.00 0.00 0.00 0.90
TGO 0.00 0.03 0.00 0.00 0.00 0.75
THA 2.42 0.52 0.20 0.02 0.46 0.97
TJK 3.11 0.24 0.05 0.01 0.09 0.92
TKM 0.76 0.20 0.08 0.01 0.07 0.97
TLS 0.46 0.35 0.03 0.01 0.21 0.68
TON 0.00 0.00 0.00 0.00 0.00 0.50
TTO 0.39 0.11 0.01 0.00 0.02 0.70
TUN 0.69 0.12 0.05 0.00 0.02 0.98
TUR 1.24 0.22 0.12 0.01 0.09 0.99
TUV 0.00 0.00 0.00 0.00 0.00 0.50
TWN 13.98 0.99 0.00 0.03 0.94 0.09
TZA 0.02 0.02 0.00 0.00 0.00 0.95
UGA 0.01 0.01 0.00 0.00 0.00 0.84
UKR 1.26 0.56 0.08 0.02 0.58 0.81
URY 0.02 0.17 0.03 0.01 0.04 0.88
USA 1.90 0.12 0.03 0.00 0.03 0.93
UZB 6.09 0.62 0.10 0.02 0.66 0.84
VCT 1.22 0.60 0.04 0.02 0.68 0.46
VEN 0.19 0.03 0.01 0.00 0.00 0.98
VGB 0.00 0.00 0.00 0.00 0.00 0.50
VIR 0.02 0.12 0.01 0.00 0.02 0.61
VNM 1.91 1.08 0.22 0.04 2.27 0.89
VUT 0.00 0.00 0.00 0.00 0.00 0.50
WSM 0.00 0.00 0.00 0.00 0.00 0.50
YEM 0.27 0.13 0.02 0.00 0.03 0.82
ZAF 0.66 0.05 0.01 0.00 0.00 0.96
ZMB -0.04 0.04 0.01 0.00 0.00 0.79
ZWE 0.03 0.04 0.01 0.00 0.00 0.95

Plotting

abline <- 
  by_ISO %>% 
  unnest(data) %>% 
  
  ggplot(aes(x = yearcount, y = irrperc, group = ISO)) +
  geom_point() +
  geom_abline(data = mean_structure,
              aes(intercept = init_stat_est,
                  slope = rate_change_est, group = ISO),
              color = "blue") +
  scale_x_continuous() +
  coord_cartesian(ylim = c(0, 35)) +
  theme(panel.grid = element_blank()) +
  facet_wrap(~ISO, ncol = 2)

ggsave("/Volumes/RachelExternal/Thesis/Thesis/abline.png", abline, height = 200,limitsize = FALSE, dpi = 300 )
## Saving 7 x 200 in image

Individual Country Fits Some of these countries don’t have fantastic fits. ALB, BGD, BGR, BRB, CUB, DNK, IND, NLD, PAK, ROU, have some issues.


Using No Priors

What if I do this again with no priors… and see if it fits better for the countries with more extreme increases in irrigated area over time. Ive run the same setup as above with the calculation of the mean, variance and bayesian \(R^2\). Below I’ve graphed the fits for the problem countries (ALB, BGD, BGR, BRB, CUB, DNK, IND, NLD, PAK, ROU). ### The No Prior Model

AFG_norm_nopri_d_yearcount <-
  brm(data = by_ISO$data[[2]],
      formula = irrperc ~ yearcount,
      control = list(adapt_delta = 0.99),
      iter = 4000, chains = 4, cores = 4,
      seed = 17,
      file = "/Volumes/RachelExternal/Thesis/Thesis/fits/AFG_norm_nopri_d_yearcount")

models_noprior <- 
  by_ISO %>%
  mutate(model = map(data, ~update(AFG_norm_nopri_d_yearcount, newdata = ., seed = 2)))

Calculation of Statistics

mean_structure_noprior <-
  models_noprior %>% 
  mutate(coefs = map(model, ~ posterior_summary(.)[1:2, 1:2] %>% 
                       data.frame() %>% 
                       rownames_to_column("coefficients"))) %>% 
  unnest(coefs) %>% 
  select(-data, -model) %>% 
  unite(temp, Estimate, Est.Error) %>% 
  pivot_wider(names_from = coefficients,
              values_from = temp) %>% 
  separate(b_Intercept, into = c("init_stat_est", "init_stat_sd"), sep = "_") %>% 
  separate(b_yearcount, into = c("rate_change_est", "rate_change_sd"), sep = "_") %>% 
  mutate_if(is.character, ~ as.double(.) %>% round(digits = 2)) %>% 
  ungroup()



residual_variance_noprior <-
  models_noprior %>% 
  mutate(residual_variance = map_dbl(model, ~ posterior_summary(.)[3, 1])^2) %>% 
  mutate_if(is.double, round, digits = 2) %>% 
  select(ISO, residual_variance)



r2_noprior <-
  models_noprior %>% 
  mutate(r2 = map_dbl(model, ~ bayes_R2(., robust = T)[1])) %>% 
  mutate_if(is.double, round, digits = 2) %>% 
  select(ISO, r2)

Plotting

abline_noprior <- 
  by_ISO %>% 
  subset(., c(ISO == "ALB"| ISO == "BGD"|ISO ==  "BGR"| ISO == "BRB"|
                ISO ==  "CUB"|ISO ==  "DNK"|ISO ==  "IND"|
                ISO ==  "NLD"|ISO ==  "PAK"|ISO ==  "ROU")) %>%
  unnest(data) %>% 
  ggplot(aes(x = yearcount, y = irrperc, group = ISO)) +
  geom_point() +
  geom_abline(data = subset(mean_structure_noprior,  c(ISO == "ALB"| 
                                                         ISO == "BGD"|
                                                         ISO ==  "BGR"| 
                                                         ISO == "BRB"|
                                                         ISO ==  "CUB"|
                                                         ISO ==  "DNK"|
                                                         ISO ==  "IND"|
                                                         ISO ==  "NLD"|
                                                         ISO ==  "PAK"|
                                                         ISO ==  "ROU")),
              aes(intercept = init_stat_est,
                  slope = rate_change_est, group = ISO),
              color = "blue") +
  scale_x_continuous() +
  coord_cartesian(ylim = c(0, 35)) +
  theme(panel.grid = element_blank()) +
  facet_wrap(~ISO, ncol = 2)


ggsave("/Volumes/RachelExternal/Thesis/Thesis/abline_noprior.png", abline_noprior, dpi = 300 )

Individual Country Fits with no priors specified

Whoa, ok so these countries are fit way way better. This makes sense though, because if the model is being refit for each country then the priors can fluctuate much more. in the first run, I limited the priors to a pretty narrow slope that wasn’t helpful for countries that have expansion trajectories that increase quicker than the range I specified in the prior. If brms is behaving like I expect it to behave, its fitting a uniform prior for the slope and a student t distribution for the intercept, FOR EACH COUNTRY. I could go back and weaken the priors chosen for fit1 but what I’m doing here is not that important.

Prior vs. No Prior Fits

Lets check the table, first the problem countries fit without a prior.

ISO init_stat_est init_stat_sd rate_change_est rate_change_sd residual_var r2
ALB 8.01 2.19 0.15 0.07 7.76 0.48
BGD -0.77 2.50 0.80 0.08 11.02 0.95
BGR 9.77 2.85 -0.13 0.10 14.53 0.26
BRB 0.29 1.92 0.27 0.07 6.36 0.80
CUB 3.05 1.07 0.14 0.04 1.89 0.80
DNK 1.56 1.70 0.24 0.06 5.12 0.80
IND 8.29 0.54 0.30 0.02 0.46 0.99
NLD 10.04 3.68 0.09 0.12 22.47 0.13
PAK 14.94 0.81 0.16 0.03 1.07 0.91
ROU 2.85 3.78 0.18 0.13 25.90 0.26

And now the original fit, with the priors.

ISO init_stat_est init_stat_sd rate_change_est rate_change_sd residual_var r2
ALB 7.09 2.12 0.09 0.07 12.22 0.27
BGD 3.10 2.99 0.09 0.10 143.96 0.02
BGR 7.51 2.14 -0.07 0.07 12.27 0.10
BRB 1.58 1.65 0.19 0.06 6.94 0.63
CUB 3.23 0.87 0.13 0.03 1.58 0.78
DNK 2.42 1.45 0.19 0.05 5.22 0.69
IND 7.38 2.11 0.22 0.10 7.77 0.98
NLD 6.68 2.76 0.04 0.08 34.94 0.05
PAK 6.30 3.20 0.04 0.09 78.16 0.14
ROU 3.81 2.30 0.08 0.08 23.33 0.07

Yeah for all of these countries the fit has been improved, by visual inspection and the bayes \(R^2\) by disregarding the priors and allowing the model to fit with default priors for each country.


Feature Selection: ggpairs()

We can check out the data using ggpairs(). What I am really looking for is the relationship with irrperc.

The Basics

#includes log_income_std, log_pop_std, precip_sc, rugged_sc
aei %>% 
  select(irrperc, log_income_std, log_pop_std, precip_sc, rugged_sc) %>% 
  ggpairs()

So from the first plot it seems that standardized log population and a scaled ruggedness seem to be somewhat correlated with irrperc. But these pairs() plots can be hard to interpret sometimes, just trying to get a first look.

I’ve plotted all of the transformed precipitation features because I am a little unclear on which I should use, and wanted to see if any of them seemed connected to trends with irrprec.

#only precipitation features are plotted here. 
aei %>% 
  select(irrperc, precip_sc, precip_std, log_precip_sc, log_precip_std) %>% 
  ggpairs()

Hmmmm, none of them seem to clearly be correlated with irrperc. Perhaps percipitation is not the metric to use.. may have to use something else.

More!

aei %>% 
  select(irrperc, Longitude, Latitude, yearcount) %>% 
  ggpairs()

Ok, higher latitudes have more irrigation than southern ones. Also, apparently eastern latitudes (pos latitudes) are linked to higher irrigation percents. But both of these are just characteristics of land mass not something continuous. and when you look at lat vs. lon you get a simplified earth! Year count also (as expected) has something to do with irrperc.

Irrigated Crops

Lets check out the crops. Ill just take the irrigated crops first and compare them with irrperc to see if any of them seem correlated.

#irrigated crops
#designated with .1
aei %>% 
  select(irrperc, c(46:50)) %>% 
  ggpairs( pch = 18)

#irrigated crops
aei %>% 
  select(irrperc, c(51:54)) %>% 
  ggpairs( pch = 18)

#irrigated crops
aei %>% 
  select(irrperc, c(55:58)) %>% 
  ggpairs( pch = 18)

I am just looking at the numbers in the first row, and the graphs in the first col. Nothing appears to jump out here, ggpairs() uses significance which can be misleading here. But countries with higher or lower amounts of irrigation don’t seem to have obvious correlations with crop types, either positive or negative.

#irrigated crops
aei %>% 
  select(precip_sc, c(46:50)) %>% 
  ggpairs( pch = 18)

#irrigated crops
aei %>% 
  select(precip_sc, c(51:54)) %>% 
  ggpairs( pch = 18)

#irrigated crops
aei %>% 
  select(precip_sc, c(55:58)) %>% 
  ggpairs( pch = 18)

Not sure if it makes too much sense to compare irrigated crops to rainfall as I am unclear of how these crops are modeled inside LPJmL. The most notable correlation coeff listed here is for temperate cereals. The graphs here again don’t reveal much.

Rainfed Crops

#rainfed crops
aei %>% 
  select(irrperc, c(32:36)) %>% 
  ggpairs( pch = 18)

#rainfed crops
aei %>% 
  select(irrperc, c(37:41)) %>% 
  ggpairs( pch = 18)

#rainfed crops
aei %>% 
  select(irrperc, c(42:46)) %>% 
  ggpairs( pch = 18)

#rainfed crops
aei %>% 
  select(precip_sc, c(32:36)) %>% 
  ggpairs( pch = 18)

#rainfed crops
aei %>% 
  select(precip_sc, c(37:41)) %>% 
  ggpairs( pch = 18)

#rainfed crops
aei %>% 
  select(precip_sc, c(42:46)) %>% 
  ggpairs( pch = 18)

Early Gaussian Models

Ok lets specify something and see if we can get it to run. Well use irrperc as we haven’t moved into zero inflated land. This is gaussian with no priors. Gaussian is not gonna work with this. But… let’s see.

fit3.1 <-
  brm(data = aei,
      formula = irrperc ~ yearcount + Latitude + Longitude + log_pop_std + rugged_sc,
      control = list(adapt_delta = 0.99),
      iter = 4000, chains = 4, cores = 4,
      seed = 17,
      file = "/Volumes/RachelExternal/Thesis/Thesis/fits/fit3.1")

plot(fit3.1,  ask = FALSE, nrow = 3)

summary(fit3.1)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: irrperc ~ yearcount + Latitude + Longitude + log_pop_std + rugged_sc 
##    Data: aei (Number of observations: 1448) 
## Samples: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
##          total post-warmup samples = 8000
## 
## Population-Level Effects: 
##             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept      -8.09      0.67    -9.40    -6.75 1.00     7244     5814
## yearcount       0.03      0.01     0.02     0.04 1.00     8097     5431
## Latitude        0.03      0.00     0.02     0.04 1.00     8375     6298
## Longitude       0.01      0.00     0.01     0.01 1.00    13893     5435
## log_pop_std     8.36      0.66     7.04     9.64 1.00     6922     5751
## rugged_sc       3.38      0.57     2.27     4.47 1.00     6766     5251
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     3.63      0.07     3.50     3.76 1.00     7646     5398
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
prior_summary(fit3.1)
##                   prior     class        coef group resp dpar nlpar bound
##                  (flat)         b                                        
##                  (flat)         b    Latitude                            
##                  (flat)         b log_pop_std                            
##                  (flat)         b   Longitude                            
##                  (flat)         b   rugged_sc                            
##                  (flat)         b   yearcount                            
##  student_t(3, 0.5, 2.5) Intercept                                        
##    student_t(3, 0, 2.5)     sigma                                        
##        source
##       default
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##       default
##       default
pp_check(fit3.1)
## Using 10 posterior samples for ppc type 'dens_overlay' by default.

I know that this is not gonna work out. Lets switch to the Zero-inflated models. All I am doing here is getting frustrated.